Skip to content

fix(opencode): honor --since/--until in SQLite loader and JSON dump scan#1188

Closed
justi wants to merge 1 commit into
ccusage:mainfrom
justi:fix/opencode-since-until-adapter
Closed

fix(opencode): honor --since/--until in SQLite loader and JSON dump scan#1188
justi wants to merge 1 commit into
ccusage:mainfrom
justi:fix/opencode-since-until-adapter

Conversation

@justi

@justi justi commented May 29, 2026

Copy link
Copy Markdown

Summary

ccusage opencode (and the aggregate ccusage) reads every row from the OpenCode SQLite message table on every invocation, regardless of --since / --until, and then re-reads every JSON file under storage/message/*.json. On long-lived OpenCode installs (especially those migrated from the pre-SQLite layout) the loader can hang for minutes or appear to lock up.

SharedArgs.since / SharedArgs.until were already plumbed into the adapter (ccusage-cli/src/types.rs:34-35), but the loader did not use them.

Reproduction (local data, before this PR)

On main @ 9d90e1b, opencode storage with:

  • opencode.db: 33.6 GB, 392,340 rows in message
  • storage/message/: 118,619 JSON files (legacy pre-SQLite dump, ~2.0 GB)
$ ccusage opencode --since 2026-05-04 --until 2026-05-10
# never returns; the loader scans 392k DB rows + parses 118k JSON files

Raw SQL with WHERE time_created BETWEEN ? AND ? against the same DB returns in <50 ms.

Fix

rust/crates/ccusage/src/adapter/opencode/loader.rs:

  1. SQLite query uses the existing index. The message table already has CREATE INDEX message_session_time_created_id_idx ON message (session_id, time_created, id). When shared.since / shared.until are set, the prepared statement now adds WHERE time_created >= ?1 AND time_created < ?2 (or open-ended variants).

  2. storage/message/*.json loop skips by mtime. Files outside the inferred range are skipped via fs::metadata(...).modified() before any read/parse.

  3. Slack. since/until are inflated by -1 / +2 days before being applied, so the existing string-based summary filter (summary.rs:265-271) remains authoritative; the SQL/mtime checks only short-circuit work that the summary would have discarded anyway. This absorbs timezone offsets and any FAT32-style 2-second mtime rounding.

Performance on local data

Variant Runtime (--since 2026-05-04 --until 2026-05-10)
main @ 9d90e1b did not return
this PR, SQL WHERE only 186.68 s
this PR, SQL WHERE + mtime-skip 9.57 s

Loaded tokens are identical across variants and match a hand-written sqlite3 query against the same DB: 414,644 input / 4,646 output.

Cross-OS compatibility

Runtime: std::fs::metadata().modified() works on macOS APFS/HFS+, Linux ext4/btrfs/xfs, and Windows NTFS. FAT32's 2-second mtime granularity is covered by the slack.

Tests: new tests use the filetime crate (FileTime::from_unix_time) which maps to utimensat (Linux), setattrlist (macOS), and SetFileTime (Windows). Same code path runs on all three CI targets.

Tests added

4 new tests in adapter::opencode::loader::tests:

  • since_filter_drops_db_rows_older_than_lower_bound
  • until_filter_drops_db_rows_at_or_after_upper_bound
  • since_filter_skips_json_files_with_older_mtime
  • no_since_until_keeps_all_json_files_regardless_of_mtime

All 285 workspace tests pass; cargo clippy --workspace --all-targets -- -D warnings is clean; cargo fmt --all -- --check is clean.

Why this is a fresh narrow PR

The same user-visible gap was reported in #801, requested in #867, and previously fixed against the now-removed TypeScript @ccusage/opencode package in #960 (closed with: "If a gap remains, it needs a fresh narrow PR against main"). This PR is that narrow PR against the current Rust adapter. No CLI surface, docs, or configuration schema changes.


Summary by cubic

Make ccusage opencode honor --since/--until in the OpenCode adapter. This stops full DB scans and legacy JSON parsing, cutting runtime on large installs from “never returns” to seconds.

  • Bug Fixes

    • SQLite: add time bounds in the query (open-ended variants) and bind params; uses the existing time_created index.
    • JSON dump: skip storage/message/*.json by file mtime before reading.
    • Safety: widen bounds by -1/+2 days so the summary-time filter remains authoritative and tolerates TZ drift and coarse mtimes.
    • Tests: add 4 tests for SQL bounds and mtime skipping; all checks pass.
  • Dependencies

    • Add dev-dependency filetime for setting mtimes in tests.

Written for commit e36d426. Summary will update on new commits.

Review in cubic

Summary by CodeRabbit

Release Notes

  • New Features

    • Date-range filtering is now available when loading OpenCode data from SQLite databases and JSON files, including configurable slack window support that extends specified time boundaries.
  • Tests

    • Test coverage expanded to validate date-range filtering functionality, including verification of time-based row exclusion from SQLite sources and file modification time-based filtering for JSON files.

Review Change Stack

The OpenCode adapter loaded every row from `opencode.db` and parsed every
file under `storage/message/*.json` on every invocation, regardless of
the `--since` / `--until` filters that `SharedArgs` already carries.
On long-lived installs (33 GB DB, 100k+ legacy JSON files) the loader
appeared to hang.

- SQLite query now uses the existing `time_created` index when bounds
  are set: `WHERE time_created >= ?1 AND time_created < ?2` (plus the
  open-ended variants).
- The `storage/message/*.json` loop now skips files whose `mtime` is
  outside the inferred range, before any read or parse.
- Bounds are inflated by -1 / +2 days so the existing summary-time
  filter (`summary.rs:265-271`) remains authoritative; the SQL/mtime
  short-circuits never drop legitimate rows under timezone offsets
  or FAT32-style 2-second mtime rounding.

Adds 4 tests in `adapter::opencode::loader::tests` covering the
SQL bounds and mtime skip paths. All 285 workspace tests pass;
`cargo clippy --workspace --all-targets -- -D warnings` is clean;
`cargo fmt --all -- --check` is clean.

Refs ccusage#801, ccusage#867, ccusage#960.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

This PR was auto-closed. Only contributors approved with lgtm can open PRs. Open an issue first.

Maintainers review auto-closed issues and reopen worthwhile ones. Issues that do not meet the quality bar in CONTRIBUTING.md may not be reopened or receive a reply.

If a maintainer replies lgtmi, your future issues will stay open. If a maintainer replies lgtm, your future issues and PRs will stay open.

See CONTRIBUTING.md.

@github-actions github-actions Bot closed this May 29, 2026
@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4b350b27-a045-441a-b2cf-0fca35c82a04

📥 Commits

Reviewing files that changed from the base of the PR and between 9d90e1b and e36d426.

⛔ Files ignored due to path filters (1)
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • rust/crates/ccusage/Cargo.toml
  • rust/crates/ccusage/src/adapter/opencode/loader.rs

📝 Walkthrough

Walkthrough

This PR adds time-based filtering with slack-adjusted bounds to the OpenCode data loader. Date bounds are parsed once from YYYYMMDD strings into UTC milliseconds, then applied to skip JSON files and SQLite rows outside the range. Tests validate filtering behavior across both data sources.

Changes

Date-range filtering with slack windows

Layer / File(s) Summary
Time utilities and parsing
rust/crates/ccusage/src/adapter/opencode/loader.rs
Adds imports for UNIX_EPOCH and time zone utilities; defines MS_PER_DAY and helpers to parse YYYYMMDD strings into UTC epoch milliseconds and compute slack-adjusted (since_ms, until_ms) bounds (since minus 1 day, until plus 2 days).
JSON message file filtering via mtime
rust/crates/ccusage/src/adapter/opencode/loader.rs
Computes date bounds once inside load_entries_from_directory and applies mtime-based filtering before reading JSON files; files outside the range are skipped.
SQLite database filtering via time_created
rust/crates/ccusage/src/adapter/opencode/loader.rs
Modifies load_entries_from_database to filter message table rows by time_created using conditional SQL with bound parameters; handles prepare/bind failures by logging and returning empty results.
Test helper, dependency, and coverage
rust/crates/ccusage/Cargo.toml, rust/crates/ccusage/src/adapter/opencode/loader.rs
Adds filetime = "0.2" dev dependency; introduces create_db_message_with_time test helper for creating SQLite rows with explicit timestamps; validates DB filtering by since/until and JSON mtime filtering when bounds are set.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

A rabbit hops through time with glee, 🐰
Filtering dates from JSON and DB with ease,
Slack windows buffering days,
In mtime and time_created ways—
No old records shall escape, you'll see! ✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@socket-security

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedcargo/​filetime@​0.2.2910010093100100

View full report

@ryoppippi

Copy link
Copy Markdown
Collaborator

Thanks for the careful writeup — the motivation is solid and the SQL path is well reasoned. One correctness concern and a couple of robustness/nit items before this is safe to merge.

Context

The authoritative date filter is filter_loaded_entries_by_date (adapter/claude/mod.rs:121), which runs after load and filters on each entry's logical date (time.created converted to TZ). So the new loader filtering is purely a load-time optimization, and the only correctness requirement is: the loader must never drop an entry that filter_loaded_entries_by_date would keep. Dropping extra rows just makes it slower; dropping kept rows silently under-counts.

✅ SQL WHERE time_created — looks correct

The -1 day / +2 day slack absorbs TZ offset (max ±14h < 1 day) and any small drift between the time_created column and JSON data.time.created, so it's always wider than the logical-date filter. No risk of dropping kept rows, and it rides the existing index. 👍

⚠️ JSON mtime skip on the --until side — silent under-count risk

File mtime is not guaranteed to match the message's logical time. cp -r, rsync without -a, backup restore, and cloud sync all reset mtime to "now". That's exactly the "legacy pre-SQLite dump migrated to a new machine" scenario this PR targets.

  • --since (skip when modified_ms < min): if mtime is reset newer than real, the file is not skipped → no loss. Reasonably safe.
  • --until (skip when modified_ms >= max): if a legacy file's mtime is reset to "now" but its logical date is in the past, an --until <past date> query will skip the file even though its logical date is in range → silent under-count.

"The summary filter is authoritative" does not protect this path, because the skip happens before load, so dropped rows never reach the authoritative filter. The new tests set mtime explicitly and so don't exercise the mtime-vs-logical-time divergence.

Suggested options:

  • Drop the --until mtime skip (the --since skip alone should capture most of the 186s → 9.57s win; legacy data is old and rarely hits the upper bound anyway), or
  • Make the mtime skip explicitly opt-in/heuristic and document that --until may under-count on environments where files were copied without preserving timestamps.

⚠️ DB schema without time_created — silent empty DB results

If an older/variant OpenCode schema lacks time_created, connection.prepare(sql) fails and the else branch returns Vec::new(). That means DB entries silently disappear only when --since/--until is set, while the no-filter path still works — an asymmetric regression. Consider falling back to the unfiltered query on prepare failure.

Nits

  • The JSON loop calls fs::metadata(&file) twice per file (once for since, once for until). With 118k files that's a lot of extra stats — fold into a single metadata call.
  • The two mtime blocks are near-duplicates; a small helper would read better.
  • filetime as a dev-dependency only is fine.

Net: SQL path is good to go; the --until mtime skip and the prepare-failure fallback are the two things I'd want addressed first.

@ryoppippi

Copy link
Copy Markdown
Collaborator

@justi heads up — #1220 (now closed) tackled the same issue with a different JSON-side strategy that I think is worth pulling into this PR.

Instead of skipping JSON files by file mtime, it extracts the real time.created millis straight from the raw JSON string and runs the exact same date check the authoritative filter_loaded_entries_by_date uses (format_date_tz → compare against since/until). That sidesteps the mtime concern I raised above entirely: it's exact, needs no -1/+2 day slack, and if extraction fails it falls through to a full parse (fails open), so it can never drop a row the authoritative filter would keep.

Your SQLite WHERE time_created push-down is the better half — it's the part that actually uses the index and gives the big speedup, which #1220 lacked. So the ideal is the combination:

That keeps the speedup while removing the silent under-count risk. Closing #1220 so this stays the single canonical PR — would you be up for folding that JSON approach in here? Happy to point you at the exact extract_message_timestamp / timestamp_within_range helpers from that PR.

@ryoppippi ryoppippi closed this Jun 15, 2026
@justi

justi commented Jun 17, 2026

Copy link
Copy Markdown
Author

Following up on your suggestion to fold #1220's JSON strategy into this PR.

I've combined both halves as you outlined:

  • SQLite: kept the indexed WHERE time_created push-down with the -1d/+2d slack (the real index speedup).
  • JSON files: replaced the mtime skip with feat(opencode): pre-filter messages by --since/--until in loader #1220's content-based time.created extraction, running the same format_date_tz check as the authoritative filter_loaded_entries_by_date and failing open to a full parse. This removes the mtime under-count risk you raised.

I also addressed the prepare-failure fallback: a schema without time_created now falls back to an unfiltered scan (the in-loop date check still applies) instead of returning no rows.

Verified on a real 33 GB / 395k-row opencode.db: the time_created column equals the payload time.created in all 395,289 rows (max delta 0 ms), so the push-down can't drop a row the authoritative filter would keep. A one-month --since/--until window returns correct totals; the full unit suite, clippy, and fmt are clean.

Would you be willing to reopen this (or drop an lgtm) so I can push the combined version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants